Decitabine treatment RNA-Seq time-series experiments¶

In order to test for any differences over multiple time points, once can use a design including the time factor, and then test using the likelihood ratio test (LRT). Here, as we have control (DMSO) and treatment (Decitabine) time series, design formula containing the condition factor, the time factor, and the interaction of the two. In this case, using the likelihood ratio test with a reduced model which does not contain the interaction terms will test whether the condition induces a change in gene expression at any time point after the reference level time point (time 0).

(see DESeq2 Time-series-experiments for more details)

Pre-processing¶

I'm mapping hg38/gencode.v34 to the fastq files using salmon.

Raw samples:

InĀ [10]:
ls ../hl60-fastq/*fastq.gz
../hl60-fastq/120h_DMSO_rep1.fastq.gz*
../hl60-fastq/120h_DMSO_rep2.fastq.gz*
../hl60-fastq/120h_treated_rep1.fastq.gz*
../hl60-fastq/120h_treated_rep2.fastq.gz*
../hl60-fastq/6h_DMSO_rep1.fastq.gz*
../hl60-fastq/6h_DMSO_rep2.fastq.gz*
../hl60-fastq/6h_treated_rep1.fastq.gz*
../hl60-fastq/6h_treated_rep2.fastq.gz*
../hl60-fastq/72h_DMSO_rep1.fastq.gz*
../hl60-fastq/72h_DMSO_rep2.fastq.gz*
../hl60-fastq/72h_treated_rep1.fastq.gz*
../hl60-fastq/72h_treated_rep2.fastq.gz*
InĀ [37]:
%%R 
# meta 
treats  <- rep(c(rep('DMSO',2), rep('treated',2)),3)
reps    <- rep(c('rep1','rep2'),6)
hours   <- c(rep('120h',4),rep('6h',4),rep('72h',4))
colData <- data.frame(
    time=hours, 
    cond=treats, 
    sample_id=paste(hours, treats, reps, sep='_'),
    row.names=colnames(txi$abundance))
colData
                  time    cond         sample_id
120h_DMSO_rep1    120h    DMSO    120h_DMSO_rep1
120h_DMSO_rep2    120h    DMSO    120h_DMSO_rep2
120h_treated_rep1 120h treated 120h_treated_rep1
120h_treated_rep2 120h treated 120h_treated_rep2
6h_DMSO_rep1        6h    DMSO      6h_DMSO_rep1
6h_DMSO_rep2        6h    DMSO      6h_DMSO_rep2
6h_treated_rep1     6h treated   6h_treated_rep1
6h_treated_rep2     6h treated   6h_treated_rep2
72h_DMSO_rep1      72h    DMSO     72h_DMSO_rep1
72h_DMSO_rep2      72h    DMSO     72h_DMSO_rep2
72h_treated_rep1   72h treated  72h_treated_rep1
72h_treated_rep2   72h treated  72h_treated_rep2

PCA¶

Initial principal component analysis (PCA) shows the second treated biological replicate at 72h time point, behaves as an outlier. Removing that from the analysis give us a better representation of our dataset. In the second PCA plot, we can see that treated samples at 6h cluster with the non-treated samples which suggest that 6 hours treatment with the drug is not as effective as 72h and 120h. Although, we will check the variant genes in this time-point in the following statistical analysis.

InĀ [40]:
%%R 
dds.pca <- DESeq(dds0, parallel=TRUE)
# results 
vsd <- varianceStabilizingTransformation(dds.pca)

pca = plot_PCA(vsd, colData)
pca
R[write to console]: estimating size factors

R[write to console]: using 'avgTxLength' from assays(dds), correcting for library size

R[write to console]: estimating dispersions

R[write to console]: gene-wise dispersion estimates: 4 workers

R[write to console]: mean-dispersion relationship

R[write to console]: final dispersion estimates, fitting model and testing: 4 workers

InĀ [47]:
%%R
p / (p.U + p.T)

Differential expression analysis¶

I'm doing differential expression analysis at 6h,72h and 120h plus time-independent comparison. Volcano-plot for all four tests included. Notably, I'm replacing results for time dependent tests with missing values or less than 0.05 adjust p-values with results from time independent test.

In addition, I use time as continues variable to run seprate analysis.

First scenario¶

Time as factor variable

InĀ [51]:
%%R
resultsNames(dds)
[1] "Intercept"            "cond_treated_vs_DMSO" "time_120h_vs_6h"     
[4] "time_72h_vs_6h"       "condtreated.time120h" "condtreated.time72h" 
InĀ [55]:
%%R
vol = vol.6h / vol.72h / vol.120h
vol

Heatmap¶

Cluster high variant genes over time.

InĀ [93]:
%%R 
library(pheatmap)
# mostVar Calculate the top n most variable genes in a matrix of gene expression data
# https://rdrr.io/github/abc-igmm/transcripTools/man/mostVar.html
mostVar <- function(data, n, i_want_most_var = FALSE) {
  data.var <- apply(data, 1, stats::var)
  data[order(data.var, decreasing = i_want_most_var)[1:n],] 
}

fc <- df[!duplicated(df$gene_name),c('gene_name','log2FC_6h','log2FC_72h','log2FC_120h')] %>% remove_rownames %>% column_to_rownames('gene_name')
# scale - Z-Score
fc <- data.frame(apply(fc,2,scale, center=TRUE, scale=TRUE), row.names=rownames(fc)) 

# filter most variable genes
fc = mostVar(fc,30)
# Plot heatmap
h1 <- pheatmap(fc,cluster_cols = FALSE,angle_col=45)
h1

human_ensembl

human_ensembl_msigdb_c1

human_ensembl_msigdb_c2

human_ensembl_msigdb_c3

human_ensembl_msigdb_c4

human_ensembl_msigdb_c5

human_ensembl_msigdb_c6

human_ensembl_msigdb_c7

human_ensembl_msigdb_full

human_ensembl_msigdb_h

human_ensembl_RBPs_coding_gene_ids

human_ensembl_RBPs_coding_gene_ids_by_3UTR

human_ensembl_RBPs_coding_gene_ids_by_5UTR

human_ensembl_RBPs_coding_gene_ids_by_coding_exons

InĀ [46]:
%%R
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS/LAPACK: /rumi/shams/abe/anaconda3/envs/deseq/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  tools     stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] patchwork_1.0.0             DESeq2_1.20.0              
 [3] SummarizedExperiment_1.10.1 DelayedArray_0.6.6         
 [5] BiocParallel_1.14.2         matrixStats_0.55.0         
 [7] ggrepel_0.8.1               forcats_0.4.0              
 [9] stringr_1.4.0               dplyr_0.8.3                
[11] purrr_0.3.3                 readr_1.3.1                
[13] tidyr_1.0.0                 tibble_2.1.3               
[15] ggplot2_3.2.1               tidyverse_1.2.1            
[17] tximport_1.10.1             GenomicFeatures_1.32.3     
[19] AnnotationDbi_1.42.1        Biobase_2.40.0             
[21] GenomicRanges_1.32.7        GenomeInfoDb_1.16.0        
[23] IRanges_2.14.12             S4Vectors_0.18.3           
[25] BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
 [1] nlme_3.1-141             bitops_1.0-6             lubridate_1.7.4         
 [4] bit64_0.9-7              RColorBrewer_1.1-2       progress_1.2.2          
 [7] httr_1.4.1               backports_1.1.5          R6_2.4.0                
[10] rpart_4.1-15             Hmisc_4.2-0              DBI_1.0.0               
[13] lazyeval_0.2.2           colorspace_1.4-1         nnet_7.3-12             
[16] withr_2.1.2              gridExtra_2.3            tidyselect_0.2.5        
[19] prettyunits_1.0.2        bit_1.1-14               compiler_3.5.1          
[22] cli_1.1.0                rvest_0.3.5              htmlTable_1.13.2        
[25] xml2_1.2.2               labeling_0.3             rtracklayer_1.40.6      
[28] checkmate_1.9.4          scales_1.0.0             genefilter_1.62.0       
[31] digest_0.6.22            Rsamtools_1.32.3         foreign_0.8-72          
[34] XVector_0.20.0           htmltools_0.4.0          base64enc_0.1-3         
[37] pkgconfig_2.0.3          htmlwidgets_1.5.1        rlang_0.4.1             
[40] readxl_1.3.1             rstudioapi_0.10          RSQLite_2.1.2           
[43] jsonlite_1.6             acepack_1.4.1            RCurl_1.95-4.12         
[46] magrittr_1.5             GenomeInfoDbData_1.1.0   Formula_1.2-3           
[49] Matrix_1.2-17            Rcpp_1.0.3               munsell_0.5.0           
[52] lifecycle_0.1.0          stringi_1.4.3            zlibbioc_1.26.0         
[55] grid_3.5.1               blob_1.2.0               crayon_1.3.4            
[58] lattice_0.20-38          Biostrings_2.48.0        haven_2.2.0             
[61] splines_3.5.1            annotate_1.58.0          hms_0.5.2               
[64] locfit_1.5-9.1           knitr_1.25               zeallot_0.1.0           
[67] pillar_1.4.2             geneplotter_1.58.0       biomaRt_2.36.1          
[70] XML_3.98-1.20            glue_1.3.1               latticeExtra_0.6-28     
[73] data.table_1.11.6        modelr_0.1.5             vctrs_0.2.0             
[76] cellranger_1.1.0         gtable_0.3.0             assertthat_0.2.1        
[79] xfun_0.10                xtable_1.8-4             broom_0.5.0             
[82] survival_2.44-1.1        GenomicAlignments_1.16.0 memoise_1.1.0           
[85] cluster_2.1.0